1. Field of the Invention
The present invention relates to a document recognition device which automatically recognizes documents, a document recognition method, and program, and a storage medium.
2. Description of the Related Art
It is advantageous to carry out processing to automatically recognize that an input document (or form) corresponds to one of a plurality of registered documents that were previously registered as templates.
In document recognition processing, features are extracted from data of a document image read by a scanner or the like, document data is generated, the degree of similarity between the input document and the registered document is obtained, and thus the registered document having the highest degree of similarity is recognized as the input document.
The following conventional document recognition methods are known.
According to Japanese Patent Laid-Open No. 2000-285187, tables in the document are focused on and the ratio of the area of each table to the total area of all tables is used as a degree of similarity that is visually recognized by a human. In this document recognition process, a higher degree of similarity is obtained for the document image having lines of similar shape, and the document is then identified from the registered documents by comparison with the degree of similarity.
However, in the document recognition process according to Japanese Patent Laid-Open No. 2000-285187, although shape features of a search document are approximately equal to those of the registered document, the document is not recognized when the colors of the document images, e.g., the color of lines, are different from each other. For example, when the original document contains color (not monochrome), a document copied by a monochrome copying machine is not recognized because the document does not have color information.
According to Japanese Patent Laid-Open No. 2001-109842, the document recognition process is performed by focusing on color information at a specific part of the document (color ID region having a document ID).
However, the document recognition process according to Japanese Patent Laid-Open No. 2001-109842 has a disadvantage in that the shape of the form is limited because the form must have the form ID region containing the form ID.
Further, according to Japanese Patent Laid-Open No. 2000-285190, a plurality of features are used as the features of the document image, that is, the document is not determined based on one feature, and then, document candidates are narrowed based on the features, and the document is thereafter recognized based on other features.
However, according to Japanese Patent Laid-Open No. 2000-285190, when the document is not determined based on the features used, the document candidates are sequentially narrowed using other features. Here, the disadvantage is that the recognition result is inconsistent and varies depending on the order of the features used for determination because the features are sequentially used.
Further, according to Japanese Patent Laid-Open No. 9-16714, a standard document is created and is registered based on the color range which is set by a user on a multi-color document image. Then, the type of document is determined by comparison with the input document.
However, the kind of color information used as a determining reference in the document recognition is limited because the document registration needs the designation of color range in advance.